Issues in Large Vocabulary, Multilingual Speech Recognition

نویسندگان

Lori Lamel

Martine Adda-Decker

Jean-Luc Gauvain

چکیده

In this paper we report on our activities in multilingual, speaker-independent,large vocabulary continuous speech recognition. The multilingual aspect of this work is of particular importance in Eu-rope, where each country has its own national language. Our existing recognizer for American English and French, has been ported to British English and German. It has been assessed in the context of the LRE SQALE project whose objective was to experiment with installing in Europe a multilingual evaluation paradigm for the assessment of large vocabulary, continuous speech recognition systems. The recognizer makes use of phone-based continuous density HMM for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. The system has been evaluated on a dictation task with read, newspaper-based corpora, the ARPA Wall Street Journal corpus of American English, the WSJCAM0 corpus of British English, the BREF-Le Monde corpus of French and the PHONDAT-Frankfurter Rundschau corpus of German. Under closely matched conditions, the average word accuracy across all 4 languages is 85%, obtained with an open-vocabulary test and 20k trigram systems (64k system German).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context

The development of Large Vocabulary Continuous Speech Recognition systems involves issues as: Acoustic Phonetic Decoding, Language Modelling or the development of appropriated Language Resources. In the state of the art, new techniques for reusing Language Resources of more resourced related languages is becoming of great interest, and there is also a growing interest on Multilingual systems. T...

متن کامل

Multilingual Speech Recognition for Information Retrieval in Indian Context

This paper analyzes various issues in building a HMM based multilingual speech recognizer for Indian languages. The system is originally designed for Hindi and Tamil languages and adapted to incorporate Indian accented English. Language-specific characteristics in speech recognition framework are highlighted. The recognizer is embedded in information retrieval applications and hence several iss...

متن کامل

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

This paper describes our recent e ort in developing the GlobalPhone database for multilingual large vocabulary continuous speech recognition. In particular we present the current status of the GlobalPhone corpus containing high quality speech data for the 9 languages Arabic, Chinese, Croatic, Japanese, Korean, Portuguese, Russian, Spanish, and Turkish. We also discuss the JANUS-3 toolkit and ho...

متن کامل

Development of Multilingual Acoustic Models in the GlobalPhone Project

This paper describes our recent eeort in developing the Glob-alPhone recognizer for multilingual large vocabulary continuous speech. Turkish. Based on ve languages we developed a global phoneme set and built multilingual speech recognizer by variing the method of acoustic model combination. Context dependent phoneme models are created using questions about languages and language groups. Results...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

Issues in Large Vocabulary, Multilingual Speech Recognition

نویسندگان

چکیده

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context

Multilingual Speech Recognition for Information Retrieval in Indian Context

The GlobalPhone Project: Multilingual LVCSR with JANUS-3

Development of Multilingual Acoustic Models in the GlobalPhone Project

عنوان ژورنال:

اشتراک گذاری